Feat: Select backend devices via arg ( +add RPC backend support)#1184
Feat: Select backend devices via arg ( +add RPC backend support)#1184stduhpf wants to merge 23 commits intoleejet:masterfrom
Conversation
094ac2e to
350df04
Compare
|
Maybe the backend #if tests on model.cpp, upscaler.cpp, etc. should be changed to runtime tests, too? Also: how hard would it be to support more than one backend with the same sd.cpp binaries - Vulkan and CUDA, for instance? |
Good point.
I think removing those #if tests and figuring out a way to build GGML with multipl backends should be enough? Edit: Actually I'm not sure if the #if tests in model.cpp are necessary at all. I could still build with Vulkan enabled when removing those. |
I believe it's leftover code. The SD_USE_FLASH_ATTENTION one on common.hpp (top one), qwen_image.hpp and z_image.hpp are trickier: they test for Vulkan for precision issues. z_image.hpp also has a |
|
I'm pretty sure ggml has runtime checks for the backend type. It would probably be better to use that instead. |
|
So sd.cpp actually support multi-backend builds? Like SYCL+CUDA at the same time? |
|
@CarlGao4 I'm not sure. I never sucessfully managed to build sd.cpp with multiple backends, but ggml should be able to handle that. I got it to build with both Vulkan and RPC, but it failed to send data to the RPC server, so I don't know if it would work with other backends (I had to add a way to connect to the RPC server via the CLI). Edit: actually RPC works if |
|
Support for multiple different backends can be achieved for third-party callers simply by switching the DLL/SO that supports the desired backend. |
|
@wbruna I got to reproduce the garbage at the end only once in my many tests. I'm not sure what's going on there. |
|
I've just realized I accidentaly included my RPC-related changes in the last commit. Since it's somewhat related should I leave them in, or should I keep that for a follow-up PR? Edit: I'm leaving them in |
513d9bf to
ef30512
Compare
0f3750f to
29e8399
Compare
29e8399 to
2d43513
Compare
…ckend support))
|
@wbruna I think it's a good idea to use https://github.com/cpm-cmake/CPM.cmake to manage third-party dependencies in your PR. |
|
@Cyberhan123 , you meant @stduhpf :-) But I think that could be interesting regardless of this PR. |
2d43513 to
3f62282
Compare
|
RPC backend doesn't seem to handle parallel tensor loading very well. Right now I've added a workaround to disable multi-threading when loading tensors with any RPC backend enabled, but maybe this could be reworked so only tensors that need to be sent to RPC are loaded sequentially. |
fix sdxl conditionner backends fix sd3 backend display
| std::string main_backend_device; | ||
| std::string diffusion_backend_device; | ||
| std::string clip_backend_device; | ||
| std::string vae_backend_device; | ||
| std::string tae_backend_device; | ||
| std::string control_net_backend_device; | ||
| std::string upscaler_backend_device; | ||
| std::string photomaker_backend_device; | ||
| std::string vision_backend_device; |
There was a problem hiding this comment.
It's best to use ggml_backend_t because this avoids losing primitives. For third-party callers like myself, we can directly use ggml_backend_init_best to get the best backend.
There was a problem hiding this comment.
I believe leejet does not want anything ggml-related in stable-diffusion.h nor in the examples. It should all be abstracted away. That being said, we could maybe just create a sd_backend_device as a wrapper for ggml_backend_t, and expose the functions to get the backend from name or select best backend?
| const char* main_device; | ||
| const char* diffusion_device; | ||
| const char* clip_device; | ||
| const char* vae_device; | ||
| const char* tae_device; | ||
| const char* control_net_device; | ||
| const char* photomaker_device; | ||
| const char* vision_device; |
There was a problem hiding this comment.
Some as use ggml_backend_t
| #ifdef SD_USE_VULKAN | ||
| if(ggml_backend_is_vk(ctx->backend)){ | ||
| to_out_0->set_force_prec_f32(true); | ||
| } | ||
| #endif |
There was a problem hiding this comment.
In this case, we delegated the abstraction to ggml, allowing it to dynamically compile the backend., so we don't need :
#ifdef SD_USE_VULKAN
#include "ggml-vulkan.h"
#endif
We can add a helper function,
// test if the backend is a specific one, e.g. "CUDA", "ROCm", "Vulkan" etc.
static inline bool sd_backend_is(ggml_backend_t backend, const std::string& name) {
ggml_backend_dev_t dev = ggml_backend_get_device(backend);
if (!dev) return false;
std::string dev_name = ggml_backend_dev_name(dev);
return dev_name.find(name) != std::string::npos;
}There was a problem hiding this comment.
I don't really like the idea of using string comparisons for this kind of things, but I guess it does make things simpler in that case.
| #ifdef SD_USE_VULKAN | ||
| if(ggml_backend_is_vk(ctx->backend)){ | ||
| net_2->set_force_prec_f32(true); | ||
| } | ||
| #endif |
| #ifdef SD_USE_VULKAN | ||
| #include "ggml-vulkan.h" | ||
| #endif | ||
|
|
| #if GGML_USE_HIP | ||
| // Prevent NaN issues with certain ROCm setups | ||
| if (ggml_backend_is_cuda(ctx->backend)) { | ||
| out_proj->set_scale(1.f / 16.f); | ||
| } | ||
| #endif |
There was a problem hiding this comment.
As well as here, We can solve the judgment problem through abstraction instead of including fixed backend header files.
| option(SD_CUDA "sd: cuda backend" OFF) | ||
| option(SD_HIPBLAS "sd: rocm backend" OFF) | ||
| option(SD_METAL "sd: metal backend" OFF) | ||
| option(SD_VULKAN "sd: vulkan backend" OFF) | ||
| option(SD_OPENCL "sd: opencl backend" OFF) | ||
| option(SD_SYCL "sd: sycl backend" OFF) | ||
| option(SD_MUSA "sd: musa backend" OFF) |
There was a problem hiding this comment.
In this case, I mean that after we completely eliminate the backend header file inclusion, we can directly use the GGML definition.
There was a problem hiding this comment.
That would be out of scope for this PR I guess, but maybe.


The main goal of this PR is to improve user experience in multi-gpu setups, allowing to chose which model part gets sent to which device.
Cli changes:
--main-backend-device [device_name]argument to set the default backend--clip-on-cpu,--vae-on-cpuand--control-net-cpuarguments--clip_backend_device [device_name],--vae-backend-device [device_name],--control-net-backend-device [device_name]arguments--diffusion_backend_device(control the device used for the diffusion/flow models) and the--tae-backend-device--upscaler-backend-device,--photomaker-backend-device, and--vision-backend-device--list-devicesargument to print the list of available ggml devices and exit.--rpcargument to connect to a compatible GGML rpc serverC API changes (stable-diffusion.h):
sd_ctx_params_tstruct.void list_backends_to_buffer(char* buffer, size_t buffer_size)to write the details of the available buffers to a null-terminated char array. Devices are separated by newline characters (\n), and the name and description of the device are separated by\tcharacter.size_t backend_list_size()to get the size of the buffer needed for void list_backends_to_buffervoid add_rpc_device(const char* address);connect to a ggml RPC backend (from llama.cpp)The default device selection should now consistently prioritize discrete GPUs over iGPUs.
For example if you want to run the text encoders on CPU, you'd need to use
--clip_backend_device CPUinstead of--clip-on-cpuTODO:
Important: to use RPC, you need to add
-DSD_RPC=ONto the build. Additionally it requireseither sd.cpp to be built withthe RPC server to be built with-DSD_USE_SYSTEM_GGMLflag (I haven't tested that one), or-DCMAKE_C_FLAGS="-DGGML_MAX_NAME=128" -DCMAKE_CXX_FLAGS="-DGGML_MAX_NAME=128"(default is 64)Fixes #1116